Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue #3527] Modify the logic around the opportunity change tracking table to never delete records #3565

Merged

Conversation

mikehgrantsgov
Copy link
Collaborator

Summary

Fixes #3527

Time to review: 30 mins

Changes proposed

Add a new table to track task progress
Rename queue table and add all opportunities to it. Add any new opportunities that come in (should basically always be 1:1)
For index job, load data based on last_loaded_at date rather than using has_update field or deleting records as we go.

Context for reviewers

Cascade deleting orphan relationship remains unchanged
JobTable is meant to be general purpose and can be used, possibly ended, by other tasks.

Additional information

See unit tests

api/src/db/models/task_models.py Outdated Show resolved Hide resolved
api/src/task/task.py Outdated Show resolved Hide resolved
api/src/db/models/task_models.py Outdated Show resolved Hide resolved


class JobTable(ApiSchemaTable, TimestampMixin):
__tablename__ = "job"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a slightly longer name, what about task_log or something like that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on this? job_log might also be good?

Also want to avoid naming the table XTable

@mikehgrantsgov mikehgrantsgov marked this pull request as ready for review January 17, 2025 22:04
Comment on lines 65 to 67
except Exception:
# Update job status to failed
self.update_job(JobStatus.FAILED, metrics=self.metrics)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want some tests to verify this works in a few scenarios to make sure the DB session works, two I can think of that might break here:

  1. An error occurs (unrelated to the DB), the DB session needs to be rolled back first
  2. An error occurs when doing a DB operation - I think the DB session does get rolled back, but it might be in an unusable state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chouinar I think the recent adds to test_task.py cover the scenarios you mention here. If a db error is thrown in task it will rollback the transaction (which I don't think SQLAlchemy does automatically?) and begin a new transaction so the FAILED state can be stored for the job.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLAlchemy doesn't rollback automatically, but if it fails to get to the end of the transaction, it should at least fail to commit.

api/src/task/task.py Outdated Show resolved Hide resolved
api/src/task/task.py Outdated Show resolved Hide resolved
api/src/search/backend/load_opportunities_to_index.py Outdated Show resolved Hide resolved
api/src/task/task.py Outdated Show resolved Hide resolved


class JobTable(ApiSchemaTable, TimestampMixin):
__tablename__ = "job"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on this? job_log might also be good?

Also want to avoid naming the table XTable

api/src/db/models/opportunity_models.py Outdated Show resolved Hide resolved
Comment on lines 52 to 59
except SQLAlchemyError:
# Rollback and begin new transaction only for DB-specific errors
self.db_session.rollback()
self.db_session.begin()
raise
except Exception:
# For non-DB errors, just raise without touching the transaction
raise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case would we want a non-DB error to still commit changes? If the task is failing, I'd rather we just rollback anyways and not commit. A partial success state seems more problematic, and if any such scenarios do exist, we should handle that on a particular task.

What about a pattern like this:

job_succeeded = True
try:
     self.run_task()
     ... # timing stuff
except Exception:
     self.db_session.rollback() # May or may not be necessary?
     job_succeeded = False
     raise

finally:
    job_status = JobStatus.COMPLETED if job_succeeded else JobStatus.FAILED
    with db_session.begin():
         self.update_job(job_status, metrics=self.metrics)

Would this hit any issues?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If run_task raises an exception, the timing stuff and metrics would not run. Not sure if that's a big issue, though. I think the rollback() is not necessary here since the task would handle its own tx stuff (right? we are not relying on this task class for tx management?), and the finally block would start its new transaction with db_session.begin.

Otherwise I don't see any other issues here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is quite right yet, but @chouinar if you have any ideas on this, maybe we can pair to make this implementation more elegant.

Comment on lines 65 to 67
except Exception:
# Update job status to failed
self.update_job(JobStatus.FAILED, metrics=self.metrics)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLAlchemy doesn't rollback automatically, but if it fails to get to the end of the transaction, it should at least fail to commit.

Copy link
Collaborator

@chouinar chouinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two small things, sorry for the slow reviews on this PR

Comment on lines 49 to 51
logger.info("Starting %s", self.cls_name())
start = time.perf_counter()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this get moved after the run_task command? Should be the first thing in the try block

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah was moving things around testing something out. Moved it back.

self.db_session.execute(
update(OpportunityChangeAudit)
.where(OpportunityChangeAudit.opportunity_id.in_(processed_opportunity_ids))
.values(updated_at=get_now_us_eastern_datetime())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the DB, we use UTC for everything, I would make this call the UTC function, not the eastern one

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the dateutil here now, thanks

@mikehgrantsgov mikehgrantsgov merged commit 029ea8e into main Jan 27, 2025
2 checks passed
@mikehgrantsgov mikehgrantsgov deleted the mikehgrantsgov/3527-modify-load-opp-logic-never-delete branch January 27, 2025 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modify the logic around the opportunity change tracking table to never delete records
4 participants